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Abstract 

Zero Determinant (ZD) strategies are a new class of probabilistic and conditional strategies that 
are able to unilaterally set the expected payoff of an opponent in iterated plays of the Prisoner's 
Dilemma irrespective of the opponent's strategy, or else to set the ratio between a ZD player's and 
their opponent's expected payoff. Here we show that while ZD strategies are weakly dominant, they 
are not evolutionarily stable and will instead evolve into less coercive strategies. We show that ZD 
strategies with an informational advantage over other players that allows them to recognize other 
ZD strategies can be evolutionarily stable (and able to exploit other players). However, such an 
advantage is bound to be short-lived as opposing strategies evolve to counteract the recognition. 



Evolutionary Game Theory (EGT) has been around for over 30 years, but apparently the theory 
still has surprises up its sleeve. Recently, Press and Dyson discovered a new class of strategies within 
the realm of two-player iterated games that allow one player to unilaterally set the opponent's payoff, 
or else extort the opponent to accept an unequal share of payoffs [TJ[2] . This new class of strategies, 
named "Zero Determinant" or ZD strategies, exploits a curious mathematical property of the expected 
payoff for a stochastic conditional "memory-one" strategy. In the standard game of EGT called the 
Prisoner's Dilemma (PD), the possible moves are termed "Cooperate" (C) and "Defect" (D), as the 
original objective of evolutionary game theory was to understand the evolution of cooperation [3jj5]. As 
opposed to deterministic strategies such as "Always Cooperate" or "Always Defect" , stochastic strategies 
are defined by probabilities to engage in one move or the other. "Memory-one" strategies make their 
move depending on theirs as well as their opponent's last move: perhaps the most famous of all memory- 
one strategies within the iterated Prisoner's Dilemma (IPD) game called "Tit-for Tat" plans its moves 
as a function of only its opponent's last move. Memory-one probabilistic strategies are defined by four 
probabilities, namely to cooperate given the four possible outcomes of the last play. While probabilistic 
one-memory iterated games were studied as early as 1990 [6j[7] and more recently by us [8], the existence 
of ZD strategies still took the field by surprise (even though such strategies had in fact been discovered 
lier [9|[l0] 
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The mathematical surprise offered up by Press and Dyson concerns the expected payoff to either 
player, for a game that is iterated infinitely. Such games can be described by a Markov process defined 
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by the four probabilities that characterize each of the two player's strategies [Tl] (because this is an 
infinitely repeated game, the probability to engage in the first move-which is unconditional-does not 
play a role here) . Each Markov process has a stationary state given by the left eigenvector of the Markov 
matrix, which in this case describes the equilibrium of the process. The expected payoff is given by the 
dot product of the stationary state and the payoff vector of the strategy. But while the stationary state is 
the same for either player, the payoff vector-given by the score received for each of the four possible plays 
CC, CD, DC, and DD-is different for the two players for the asymmetric plays CD and DC. Because 
the expected payoff is a linear function of the payoffs for each of the four plays, it is possible for one 
strategy to enforce the payoff of the opponent by a judiciously chosen set of probabilities that makes the 
linear combination of determinants vanish (hence the name ZD strategies) . Note that this enforcement 
is asymmetric because of the asymmetry in the payoff vectors introduced earlier: while the ZD player 
can choose the opponent's payoff to depend only on their own probabilities, the payoff to the ZD player 
depends on both the ZD player's as well as the opponent's probabilities. This is the mathematical 
surprise: the expected payoff is usually a very complicated function of six probabilities (and four payoff 
values, for the four possible plays). When playing against the ZD strategy, the payoff that the opponent 
reaps is defined by the payoffs and only two remaining probabilities that characterize the ZD strategies. 

Let Pi,P2,P3,P4 be the probabilities of the ZD player to cooperate given the outcomes CC,CD,DC and 
DD of the previous encounter, and q±, q2, 93, (74 the probabilities of any other non-ZD strategy (hereafter 
the "O" -strategy). Given the payoffs (R, S,T, P) for the four outcomes, the payoff to the ZD-strategist 
opponents is determined entirely by the ZD strategist's probabilities: 

E{0,ZD) = m = ( 1 -ri P + P ' R , (1) 
1 - Pi + Pi 

while the the ZD strategist's payoff against O 

E(ZD,0)=g(p,q) (2) 

is a complicated function of both ZD's and O's strategy that is too lengthy to write down here [but see 
Methods for the mean payoff given the standard [3] PD values (R, S, T, P) = (3, 0, 5, 1)]. 

In Eqs. ( Tp | we adopted the notation of a payoff matrix where the payoff is given to the "row- 



player" . Note that the payoff that the ZD player forces upon its opponent is not necessarily smaller than 
what the ZD player receives. For example, the payoff for ZD against the strategy "All-D" that defects 
unconditionally at every move q = (0, 0, 0, 0) is 

£( zo, AU - D > = ,+ B<^(^|), 

which is strictly lower than for all games in the realm of the PD parameters. 

Interestingly, a ZD strategist can also extort an unfair share of the payoffs from the opponent, who 
however could refuse it (turning the game into a version of the Ultimatum Game |12| ). In extortionate 
games, the strategy being preyed upon can increase their own payoff by modifying their own strategy q, 
but this only increases the extortionate strategy's payoff. As a consequence, Press and Dyson conclude 
that a ZD strategy will always dominate any opponent that adapts its own strategy to maximize their 
payoff, for example by Darwinian evolution 11] . Here we show that ZD strategies (those who fix the 
opponent's payoff as well as those who extort) are actually evolutionarily unstable, are easily outcompeted 
by fairly common strategies, and quickly evolve to become non-ZD strategies. However, if ZD strategies 
can determine who they are playing against (either by recognizing a tag or by analyzing the opponent's 
response), ZD strategists are likely to be very powerful agents against unwitting opponents. 
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Results 



In order to determine whether a strategy will succeed in a population, Maynard Smith proposed the 
concept of an "Evolutionary Stable Strategy" (or ESS) |4 . For a game involving arbitrary strategies I 
and J, the ESS is easily determined by an inspection of the payoff matrices of the game as follows: / is an 
ESS if the payoff E(I, I) when playing itself is larger than the payoff E(J, I) between any other strategy 
J and I, i.e., I is ESS if E(I,I) > E(J,I). In case E(I,I) = E(J,I), then I is an ESS if at the same 
time E(I, J) > E(J, J). These equations teach us a fundamental lesson in evolutionary biology: it is not 
sufficient for a strategy to outcompete another strategy in direct competition, rather, it must also play 
well against itself. The reason for this is that if a strategy plays well against an opponent but reaps less 
of a benefit competing against itself, then it will be able to invade a population but will quickly have to 
compete against its own offspring and its rate of expansion slows down. This is even more pronounced in 
populations with a spatial structure, where offspring are placed predominantly close to the progenitor. If 
the competing strategy in comparison plays very well against itself, then a strategy that only plays well 
against an opponent may not even be able to invade. 

If we assume that two opponents play a sufficiently large number of games, their average payoff 
approaches the payoff of the Markov stationary state [TJfTT] . We can use this mean expected payoff as 
the payoff to be used in the payoff matrix E that will determine the ESS. For ZD strategies playing O 
(other) strategies, we know that ZD enforces E(0,ZD)=f(p) shown in Eq. (JlJ, while ZD receives g(p,q) 
[Eq. ph]. But what are the diagonal entries in this matrix? We know that ZD enforces the score ([T]) 
regardless of the opponent's strategy, which implies that it also enforces this on another ZD strategist. 
Thus, E (ZD, ZD)— f(p). The payoff of O against itself only depends on O's strategy q: E(0,0)=h(q), and 
is the key variable in the game once the ZD strategy is fixed. The effective payoff matrix then becomes 



The payoff matrix of any game can be brought into a normal form with vanishing diagonals without 



affecting the competitive dynamics of the strategies by subtracting a constant term from each column 13 
so the effective payoff Q is equivalent to 

g(p, q) - h(q) 



We notice that the fixed payoff f(p) has disappeared, and the winner of the competition is determined 
entirely from the sign of g(p,q) — h(q), as seen by an inspection of the ESS equations. In principle, a 
mixed strategy (a population mixture of two strategies that are in equilibrium) can be an ESS [I] but 
this is not possible here precisely because ZD enforces the same score on others as it does on itself. 



Evolutionary dynamics of ZD via replicator equations. 

ZD strategies are defined by setting two of the four probabilities in a strategy to specific values so that 
the payoff £7(0, ZD) depends only on the other two, but not on the strategy O. In Ref. [I], the authors 
chose to fix p2 and p^, leaving pi and P4 to define a family of ZD strategies. The requirement of a 
vanishing determinant limits the possible values of p\ (the probability to cooperate if in the previous 
move both players cooperated) to close to 1, while p^ must be near (but not equal to) zero. Let us 
study an example ZD strategy defined by the values p\ = 0.99 and p4 = 0.01. The results we present 
do not depend on the choice of the ZD strategy. An inspection of Eq. ([!]) shows that, if we use the 
standard payoffs of the Prisoner's Dilemma (R, S,T, P) = (3,0,5,1), then f(p) = 2. If we study the 
strategy "All-D" (always defect, defined by the strategy vector q = (0,0,0,0) as opponent) we find that 
g(p,Q) = 0.75, while h(q) = 1. As a consequence, g(p,q) — h(q) is negative and All-D is the ESS, that 
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is, ZD will lose the evolutionary competition with All-D. However, this is not surprising as ZD's payoff 
against All-D is in fact lower than what ZD forces All-D to accept, as mentioned earlier. But let us 
consider the competition of ZD with the strategy "Pavlov" , which is a strategy of "win-stay-lose-shift" 



that can outperform the well-known "Tit-for-Tat" strategy in direct competition 14 . Pavlov is given by 
the strategy vector (fpAv = (1, 0: 0, 1), which (given the ZD strategy p described above and the standard 
payoffs) returns E(ZD,PAV)=ll/27« 2.455 to the ZD player, while Pavlov is forced to receive f(p) = 2. 
Thus, ZD wins every direct competition with Pavlov. Yet, Pavlov is the ESS because it cooperates with 
itself, so h(q) — 3. 

We can check that Pavlov is the ESS by following the population fractions as determined by the 
replicator equations [5 13 15 , which describe the frequencies of strategies in a population 



TT t = TTi(Wi - to), 



(6) 



where 7Tj is the population fraction of strategy i, tOj is the fitness of strategy i, and w is the average fitness 
in the population. In our case, the fitness of strategy i is the mean payoff for this strategy, so 



WZD 

too 



^zd£(ZD,ZD) 
^zd£(0,ZD)- 



+ tt £(ZD, O) 
^o£(0,0) 



(7) 
(8) 



and to = 7TzdWzd + 7rotoo- We show tt^d and 7tahd (with itzd + 7tahd = 1) hi Fig. 1 as a function of 
time for different initial conditions, and confirm that Pavlov drives ZD to extinction regardless of initial 
concentration. 



K 




Figure 1. Population fractions ttzb (blue) and 7Tpav (green) as a function of time for initial ZD 
concentrations itzd(0) = 0.1 — 0.9. 



Evolutionary dynamics of ZD in agent-based simulations. 

It could be argued that an analysis of evolutionary stability within the replicator equations ignores the 
complex game play that occurs in populations where the payoff is determined in each game, and where 
two strategies meet by chance and survive based on their accumulated fitness. We can test this by 
following ZD strategies in contest with Pavlov in an agent-based simulation with a fixed population size 
of Afpop = 1, 024 agents, a fixed replacement rate of 0.1% (see Methods) and using a fitness-proportional 
selection scheme (a death-birth Moran process). Details of the simulation implementation are as published 
earlier in Ref. [16] . 
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In Fig. 2, we show the population fractions ttzd and 7Tpav for two different initial conditions [7Tzd(0) = 
0.4 and 0.6], using a full agent-based simulation (solid lines) or using the replicator equations (dotted 
lines). While the trajectories differ in detail (likely because in the agent-based simulations generations 
overlap, the number of encounters is not infinite but dictated by the replacement rate, and payoffs are 
accumulated over eight opponents randomly chosen from the population) , the dynamics are qualitatively 
the same. This can also be shown for any other stochastic strategy q. Note that in the agent-based sim- 
ulations, strategies have to play the first move unconditionally. We have set this "first move" probability 
to Pc = 0.5 for both Pavlov and ZD. 

Agent-based simulations thus corroborate what the replicator equations have already told us, namely 
that ZD strategies have a hard time surviving in populations because they suffer from the same low 
payoff that they impose on other strategies if faced with their own kind. However, ZD can win some 
battles, in particular against strategies that cooperate. For example, the stochastic cooperator GC 
["general cooperator", defined by p = (0.935,0.229,0.266,0.42)] is the evolutionarily dominating strategy 
(the fixed point) evolved at low mutation rates in Ref. |8j. GC is a cooperator that is very generous, 
cooperating after mutual defection almost half the time. GC loses out (in the evolutionary sense) against 
ZD because _E(Z,GC)=2.I25 while -E(GC,GC)=2.I1, and ZD certainly wins (again in the evolutionary 



sense) against the unconditional deterministic strategy "All-C" that always cooperates [see Eq. (14 1 in 



the Methods]. If this is the case, how is it possible that GC is the evolutionary fixed point rather than 
ZD, when strategies are free to evolve from random ancestors [8]? 




Figure 2. Population fractions ttzd (blue tones) 7Tpav (green tones) for two different initial 
concentrations. The solid lines show the average of the population fraction from 40 agent-based 
simulations as a function of evolutionary time measured in updates, while the dashed lines show the 
corresponding replicator equations. Because time is measured differently in agent-based simulations as 
opposed to the replicator equations, we adjusted the time scale of the Runge-Kutta simulation of 
Eq. j6j) to match the agent-based simulation. 



Mutational instability of ZD strategies. 

To test how ZD fares in an experiment where strategies can evolve (in the previous sections, we only 
considered the competition between strategies that are fixed), we ran evolutionary (agent-based) simu- 
lations in which strategies are encoded genetically. The genome itself evolves via random mutation and 
fitness-proportional selection. For stochastic strategies, the probabilities are encoded in 5 genes (one 
unconditional and four conditional probabilities drawn from a uniform distribution when mutated) and 
evolved as described in the Methods and in Ref. [8] . Rather than starting the evolution experiments with 
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random strategies, we seeded them with the particular ZD strategy we have discussed here [p\ = 0.99 
and p 4 = 0.01). These experiments show that when we use a mutation rate that favors the strategy 
GC as the fixed point, ZD evolves into it even though ZD outcompetes GC at zero mutation rate as we 
saw in the previous section. In Fig. 3, we show the four probabilities that define a strategy over the 
evolutionary line of descent, followed over 50,000 updates of the population (with a replacement rate of 
1%, this translates on average to 500 generations). The evolutionary line of descent (LOD) is created 
by taking one of the final genotypes that arose in the experiment, and following its ancestry backwards 
mutation by mutation, to arrive at the ZD ancestor used to seed the experiment [17]. (Because of the 
competitive exclusion principle [18], the individual LODs of all the final genotypes collapse to a single 
LOD with a fairly recent common ancestor). The LOD confirms what we had found earlier [8], namely 
that the evolutionary fixed points are independent of the starting strategy and simply reflect the optimal 
strategy given the amount of uncertainty (here introduced via mutations) in the environment. We thus 
conclude that ZD is unstable in another sense (besides not being an ESS): it is genetically or mutationally 
unstable, as mutations of ZD are likely not ZD, and we have shown earlier that ZD generally does not do 
well against other strategies that defect but are not ZD themselves. 
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Figure 3. Evolution of probabilities p\ (blue), pi (green), p% (red) and p4 (teal) on the evolutionary 
line of descent of a well-mixed population of 1,024 agents, seeded with the ZD strategy 
(pi,P2,P3,P<t) = (0.99,0.97,0.02,0.01). Lines of descent are averaged over 40 independent experiments, 
as described in [si. Mutation rate per gene fi — 1%, replacement rate r = 1%. 



Stability of extortionate ZD strategies. 

Extortionate ZD strategies ("ZDe" strategies) are those that set the ratio of the ZD strategist's payoff 
against a non-ZD strategy [l] rather than setting the opponent's absolute payoff. Against a ZDe strategy, 
all the opponent can do (in a direct matchup) is to increase their own payoff by optimizing their strategy, 
but as this increases ZDe's payoff commensurately, the ratio (set by an extortion factor \i where x = 1 
represents a fair game) remains the same. Press and Dyson show that for ZDe strategies with extortion 
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factor x, the payoffs are 



£(0,ZDc) = f^|r, (9) 

£(ZDc,0) = (10) 

which implies that i?(ZDe, O) > E(0, ZDe) for all x > 1. However, ZDe plays terribly against other ZDc 
strategies, who are defined by a set of probabilities given in Ref. Notably, ZDe strategies havej»4 = 0, 
that is, they never cooperate after both opponents defect. It is easy to show that for p4 = 0, the mean 
payoff _E(ZDe, ZDe) = P, that is, the payoff for mutual defection. As a consequence ZDe can never be 
an ESS as £"(0, ZDe) > E^ZDe, ZDe) for all finite x > 1; except when x — ► °°j where ZDe can be ESS 
along with an opponent's strategy that has a mean payoff E(0, O) not larger than P. 

Given that ZD and ZDe are evolutionary unstable against a large fraction of stochastic strategies, is 
there no value to this strategy then? We will argue below that strategies that play ZD against non-ZD 
strategies but a different strategy (for example cooperation) against themselves, may very well be highly 
fit in the evolutionary sense, and emerge in appropriate evolution experiments. 

ZD strategies that can recognize other players. 

Clearly, winning against your opponents isn't everything if this impairs the payoff against similar or 
identical strategies. But what if a strategy could recognize who they play against, and switch strategies 
depending on the nature of the opponent? For example, such a strategy would play ZD against others, 
but cooperate with other ZD strategists instead. It is in principle possible to design strategies that use a 
(public or secret) tag to decide between strategies. Riolo et al. [19] designed a game where agents could 
donate costly resources only to players that were sufficiently similar to them (given a tag). This was 
later abstracted into a model in which players can use different payoff matrices (such as those for the 
Prisoner's Dilemma or the "Stag Hunt" game) depending on the tag of the opponent [20]. Recognizing 
another player's identity can in principle be accomplished in two ways: the players can simply record an 
opponent's tag and select a strategy accordingly [21| , or they can try to recognize a strategy by probing 
the opponent with particular plays. When using tags, it is possible that players cheat by imitating the 
tag of the opponent [22] (in that case it is necessary for players to agree on a new tag so that they can 
continue to reliably recognize each other). 

A tag-based ZD strategy ("ZDt") can cooperate with itself, while playing ZD against non-ZD players. 
Let us first test that using tags confers stability against a strategy ZD loses to, namely All-D. The effective 
payoff matrix becomes (using the standard payoff values and our example ZD strategy p\ — 0.99, p^ = 
0.01) 



ZDt A11D 
ZDt / 3 0.75 
A11D { 2 1 



(11) 



and we note that now both ZDt and All-D can be an ESS. The effective game belongs to the class of 
coordination games (a typical example is the Stag Hunt game [23]), which means that the interior fixed 



point (jrzDt , 7TAHD ) — (0.2,0.8) is itself unstable 13 and the winner of the competition depends on the 



initial density of strategies. This is a favorable game for ZDt, as it will outcompete A11D as long a 
its initial density exceeds 20% of the population. What happens if the opposing strategy acquires the 
capacity to distinguish self from non-self as well? The optimal strategy in that case would defect against 
ZDt players, but cooperate with itself. The effective payoff matrix then becomes ("CD" is the conditional 
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defector) 



ZDt 
CD 



( 



ZDt CD 
3 0.75 
2 3 



) 



(12) 



This game is again in the class of coordination games, but the interior unstable fixed point is now 
(7i"ZDt, ttcd) = (9/13,4/13), which is not at all favorable for ZDt anymore as the strategy now needs to 
constitute over 69% of the population in order to drive the conditional defector into extinction. We thus 
find that tag-based play leads to dominance based on numbers (if players cooperate with their own kind) , 
where a tag-based ZD strategy is only favored if it is the only one that can recognize itself. Indeed, 
tag-based recognition is used to enhance cooperation among animals via the so-called "green-beard" 
effect [24j[25], and can give rise to cycles between mutualism and altruism [26] . Recognizing a strategy 
from behavior is discussed below. Note that whether a player's strategy is identified by a tag or learned 
from interaction, in both cases it is communication that enables cooperation [8]. 

A shortest-memory player is unstable against longer- memory players. 

In order to recognize a player's strategy via its actions, it is necessary to be able to send complex sequences 
of plays, and react conditionally on the opponent's actions. In order to be able to do this, a strategy must 
be able to use more than just the previous plays (memory-one strategy). This appears to contradict the 
conclusion reached in [l] that the shortest-memory player sets the rule of the game. This conclusion was 
reached by correctly noting that in a direct competition of a long-memory player and a short-memory 
player, the payoff to both players is unchanged if the longer memory player uses a "marginalized" short- 
memory strategy. However, as we have seen earlier, in an evolutionary setting it is necessary to not 
only take cross-strategy competitions into account, but also how the strategies fare when playing against 
themselves, that is, like-strategies. Then, it is clear that a long-memory strategy will be able to recognize 
itself (simply by noting that the responses are incompatible with a "marginal" strategy) and therefore 
distinguish itself from others. Thus, it appears possible that evolutionarily successful ZD strategies can 
be designed that use longer memories to distinguish self from non-self. Of course, such a strategy will 
be vulnerable to mutated strategies that look sufficiently like a ZD player to fool it, but subtly exploit it 
instead. 

Discussion 

ZD strategies are a new class of conditional stochastic strategies for the iterated PD (and likely other 
games as well) that are able to unilaterally set an opponent's payoff, or else set the ratio of payoffs 
between the ZD strategist and its opponent. The existence of such strategies is surprising, but they are 
not evolutionarily (or even mutationally) stable in adapting populations. Evolutionary stability can be 
determined by using the steady-state payoffs of two players engaged in an unlimited encounter as the 
one-shot payoff matrix between these strategies. Maynard Smith's standard ESS conditions applied to 
that payoff matrix shows that while ZD strategies are weakly dominant (their payoff against self is equal 
to what any other strategy receives playing against them) it is the opposing strategy that is often the 
ESS. (ZDe strategies are not even weakly dominant, except for the limiting case \ °°-) It is even 
possible that ZD strategies win every single matchup against non-ZD strategies (such as against the 
strategy "Pavlov"), yet be evolutionarily unstable and be driven to extinction. 

While this argument relies on using the steady-state payoffs, it turns out that an agent-based simu- 
lation with finite iterated games reproduces those results almost exactly. Furthermore, ZD strategies are 
mutationally unstable even when they are the ESS at zero mutation rate, because the proliferation of ZD 
mutants that are not exactly ZD creates an insurmountable obstacle to the evolutionary establishment of 
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ZD, which instead evolves into a harmless cooperating or defecting strategy (depending on the mutation 
rate, see M). 

For ZD strategists to stably and reliably outcompete other strategies, they have to have an infor- 
mational advantage. This "extra information" can be obtained by using a tag to recognize each other 
and conditionally cooperate or play ZD depending on this tag, or by having a longer-memory strategy 
that a player can use to prove the opponent's play. Tag-based play leads to an effective game in the 
"coordination" class if players cooperate with themselves but not against the opponent, with the winner 
determined by the initial density. Of course, such a tag- or an information-based dominance is itself 
vulnerable to the evolution of interfering mechanisms by the exploited strategics, cither by imitating the 
tag (and thus destroying the information channel) or by evolving longer memories themselves. Needless 
to say, this type of evolutionary arms race has been, and will be, observed throughout the biosphere. 

Methods 

Steady-state payoff to the ZD strategist 

If we set the payoff table to the standard values of the Prisoner's Dilemma, i.e., (R, S, T, P) = (3, 0, 5, 1), 
the payoff Eq. ^ received by the ZD strategist (defined by the pair pi and p 4 ) playing against a strategy 
q can be calculated to be 



g{p, q) = 

q 3 {7qi - 1) + 3q 2 (qi - 5q 3 - q 4 - 1) + 2q 4 (9q 3 - 5q 1 + 2) 



Pi 



Pi 
Pi 



g 2 (5g 3 + 6g 4 + 2) - 2q 3 {l'Sq 4 + 3) + qi (20q 4 - 6q 2 + q 3 + 4) 
q 2 {7 qi - Uq 3 - 9q 4 ~ 5) + q 3 (6 qi + 35g 4 - 5) - 2 94 (1 + 10<ft) - 2 
q 3 {Sq 2 -P4 + 20 Pi q 2 - 10) + qi(pi{9q 2 - AAq 3 - 8) + \2q 2 - 39<7 3 + 21) 
+qi (j3 4 (30q 4 - 9q 2 - 6q 3 + 2) + 2q 3 + 20q 4 - 8q 2 + 4) + 7p 4 q 2 + 4 
2q 2 (l + q x ) + 93 (4 - q x - 3q 2 ) - <? 4 (20 + 6q 2 - 13®,) - 4 1/ 



(Pi - Pi - 1) { 93 



qi +P4(qi - 3) - 4 



94 



p 4 (5qi - 6g 3 - 3) - 3q 3 + 5 



'12 



Pi 



p 4 (q 4 + 5q 3 - 6q 1 + 6) + q 4 + 3q 3 - 2q 2 - 2) 



gi(5g 4 + q 3 - 6q 2 + 4) + q 2 (q 4 + 5q 3 + 2) - 6q 3 (l + q 4 ) 



(13) 



Interesting limiting cases are the payoffs against All-D Eq. ^ , as well as the payoff against an uncondi- 
tional cooperator, given by 



g(p,q = i) = 3 



i-pi 

3 V 1 - px + p 4 



(14) 
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For the strategy Pavlov, using <fpAv = (1,0, 0, 1) in ( 13 1 yields 



,~ \ 6(2(1- Pl )+ P4 ) 2 

9(P, QPAV) = J7T7-, , . Wl . yT , (15 

(9(1 -pi) + 2p 4 ) (1 -pi +P4)) 
or g(p, qpAv) — 27/11 for the ZD strategy with p\ — 0.99 and p4 = 0.01. 

Agent-based modeling of iterated game play 

In order to study how conditional strategies such as ZD play in populations, we perform agent-based 
simulations in which strategies compete against each other in a well-mixed population of 1,024 agents, 



as described in more detail in Ref. 16 . Every update, an agent plays one move against eight random 
other agents, and the payoffs accumulate until either of the playing partners is replaced. The fitness of 
each player is the total payoff accumulated, averaged over all eight players he faced. Because we replace 
0.1% of the population every update using a death-birth Moran process, on average an agent plays 500 
moves against the same opponent. Agents cooperate or defect with probability 0.5 on the first move 
as this decision is not encoded within the probabilistic set of four (conditional) probabilities. We start 
populations at fixed mixtures of strategies (for example, ZD versus Pavlov as in Fig. 2), and update the 
population until one of the strategies has gone to extinction. In such an implementation, there are no 
mutations and as a consequence strategies do not evolve. 

Agent-based modeling of strategy evolution 

To simulate the Darwinian evolution of stochastic conditional strategies, we encode the strategy into five 
loci, encoding the conditional probabilities p = (pi,P2,P3,P4) as well as the unconditional probability 
Po, as described in j8j. Like in the simulation of iterated game play without mutation, agents play eight 
randomly selected other agents in the (well-mixed) population, and are replaced with a set of genotypes 
that survived the 1% removal per update and selected with a probability proportional to their fitness. 
After the selection process, probabilities are mutated with a probability of 1% per locus. The mutated 
probability is drawn from a uniformly distributed random number on the interval (0, 1). In order to 
visualize the course of evolution, we reconstruct the ancestral line of decent (LOD) of a population 
by retracing the path evolution took backwards, from the strategy with the highest fitness at the end 
of the simulation back to the ancestral genotype that served as the seed. Because their is no sexual 
recombination between strategies, each population has a single line of descent after moving past the most 
recent common ancestor of the population. The line of descent recapitulates the evolutionary history 
of that particular experiment, as it contains the sequence of mutations that gave rise to the successful 



strategy at the end of the run (see, e.g., 17 27]). Because the evolutionary trajectory for any particular 
locus is usually fairly noisy, we average trajectories over many replicate runs in order to capture the 
selective pressures affecting each gene. 
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